{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Unconstrained Non-Linear Programming\n", "## One dimensional problems set-up\n", "Let us first consider just one decision variable. The objective is to search a maximum or minimum of a non-linear function. Without loss of generalisation, let us consider that the objective is to minimize the objective function:\n", "\n", "$\\min f(x) \\quad x \\in \\mathbb{R}$\n", "\n", "Any (local or global) minimum of $f(x)$, $x^*$ will have the following properties:\n", "\n", "1. **First derivative test**: The first derivative of a function at a point $x^*$ represents the slope of a line tangent to $f(x)$ at $x^*$. In a minimum, the first derivative is equal to zero, $\\frac{df(x)}{dx}=0$, meaning that the tangent is horizontal.\n", "\n", "2. **Second derivative test**: The second derivative is positive $\\frac{d^2f(x)}{dx^2}$, meaning that the slope of the tangent is increasing and therefore, it must be less than zero for $x < x^*$, meaning that the function decreases before $x^*$, and greater than zero for $x > x^*$, meaning that the function increases after $x^*$. Therefore $f(x)$ takes its minimum value at $x^*$, at least in its proximity.\n", "\n", "By definition, $x^*$ is a critical point. For a maximum, the second derivative is negative and using the same logic, $f(x)$ decreases in the proximity of $x^*$.\n", "\n", "### Bi-section algorithm\n", "Since the sign of the first derivative changes at a critical point, it is possible to search a critical point if we find two initial points such that the sign of the first derivative is different. The simplest algorithm that uses this property is known as the bi-section method, and it can be formulated as this:\n", "\n", "**Initialisation**\n", "\n", "1. Choose an acceptable error $\\epsilon$\n", "2. Find an initial left point $x_L$ such that $\\frac{df(x_L)}{dx} < 0$\n", "3. Find an initial right point $x_R$ such that $\\frac{df(x_R)}{dx} > 0$\n", "\n", "**Iteration**\n", "1. Bisect the range $[x_L, x_R] \\quad x'=\\frac{𝑥_𝑅−𝑥_𝐿}{2}$\n", "2. If $\\frac{df(x')}{dx} < 0$ then make $𝑥_𝐿=x'$, else if $\\frac{df(x')}{dx} < 0$ then make $𝑥_𝑅=x'$\n", "3. If $𝑥_𝑅−𝑥_𝐿 \\leq 2\\epsilon$ then the solution is $\\frac{𝑥_𝑅−𝑥_𝐿}{2}$, else repeat\n", "\n", "That is, we start with two initial values for which the first derivative has different sign. If the function is continuously differentiable, then there must be a value somewhere in between these values where the first derivative is zero. Then, we make a bi-section and continue looking in the part of the range for which there is a change in the sign of the first derivative. The process stops either when we find exactly a critical point or when the distance between both points is less than the acceptable error. \n", "\n", "## Multidimensional problems set-up\n", "For multidimensional problems the logic applied is the same, except that in this case it is necessary to consider partial derivatives in the different decision variables or dimensions of the problem.\n", "Thus, given an n-dimensional function: \n", "\n", "$f(x) \\quad x=[x_1, x_2, ..., x_n]$ \n", "\n", "The **gradient vector** is defined as a vector containing the first partial derivatives: \n", "\n", "$\\nabla f(x) = \\left[\\begin{array}{c}\n", "\\dfrac{\\partial f(x)}{\\partial x_1}\\\\\n", "\\dfrac{\\partial f(x)}{\\partial x_2}\\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial f(x)}{\\partial x_n}\n", "\\end{array}\\right]$\n", "\n", "The gradient represents the slopes of the tangent planes to $f(x)$ in every dimension. For a minimum, the gradient must be zero, as an extension of the first derivative test of one dimensional problems.\n", "\n", "1. **Gradient test** The gradient must be zero at a critical point $x^*$ $\\nabla f(x^*) = [0, ..., 0]^T$.\n", "\n", "Now, to know if the function decreases **in every direction** around $x^*$, we need ot define the **Hessian**:\n", "\n", "$\\text{Hess} f(x)= \n", " \\begin{bmatrix}\n", " \\frac{d^2f(x)}{dx_1^2} & \\frac{d^2f(x)}{dx_1dx_2} & ... & \\frac{d^2f(x)}{dx_1dx_n} \\\\\n", " \\frac{d^2f(x)}{dx_2^dx_1} &\\frac{d^2f(x)}{dx_2^2} & ... & \\frac{d^2f(x)}{dx2dx_n} \\\\\n", " \\vdots \\\\\n", " \\frac{d^2f(x)}{dx_n^dx_1} &\\frac{d^2f(x)}{dx_ndx_2} & ... & \\frac{d^2f(x)}{dx_n^2}\n", " \\end{bmatrix}$ \n", " \n", " $\\text{Hess} f(x)= \\begin{bmatrix}\n", " h_{11} & h_{12} & ... & h_{1n} \\\\\n", " h_{21} & h_{22} & ... & h_{2n} \\\\ \n", " \\vdots \\\\\n", " h_{n1} & h_{n2} & ... & h_{nn}\n", " \\end{bmatrix}$\n", "\n", "That is, the Hessian is a nxn matrix that contains the second order derivatives. \n", "\n", "## Sylvester's criterion\n", "To determine how the function increases or decreases in any direction around a given point, we can check **Sylvester's criterion**. This criterion evaluates the **leading principal minors** of the Hessian matrix. A [leading principal minor](https://en.wikipedia.org/wiki/Minor_(linear_algebra)) $k$ of a Hessian matrix is the determinant of the square sub-matrix that includes columns and rows from 1 to $k$.\n", "\n", "### Sylvester's criterion for a 2 x 2 Hessian matrix\n", "For a 2 x 2 Hessian matrix as:\n", "\n", " $\\text{Hess} f(x)=\n", " \\begin{bmatrix}\n", " h_{11} & h_{12} \\\\\n", " h_{21} & h_{22}\n", " \\end{bmatrix}$\n", "\n", "we can define the leading principal minors as:\n", "\n", "$H_1 = h_{11}$\n", "$H_2 = \\det \\text{Hess} f(x) = \\begin{vmatrix}\n", " h_{11} & h_{12} \\\\\n", " h_{21} & h_{22}\n", " \\end{vmatrix} = h_{11}h_{22} - h_{12}h_{21}$\n", "\n", "Sylvester's criterion states that:\n", "- If $H_1 > 0$ and $H_2 > 0$ then the function is convex and $x^*$ is a global minimum. The Hessian is **positive definite**. Intuitively, this means that every small movement in any direction from $x^*$ will increase the value of the function, meaning that the function bends upwards in every direction around this point (like a cup $\\cup$), making it a strict local minimum.\n", "- If the leading principal minors alternate signs ($H_1 < 0$ and $H_2 > 0$) then the function is concave and $x^*$ is a global maximum. The Hessian is **negative definite**. For comparison,this means that the function looks like a cap $\\cap$ around $x^*$, meaning that every small movement in any direction from $x^*$ will decrease the value of the function, making it a strict local maximum.\n", "\n", "\n", "\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" }, "pycharm": { "stem_cell": { "cell_type": "raw", "source": [], "metadata": { "collapsed": false } } } }, "nbformat": 4, "nbformat_minor": 0 }